{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.算法实践（sklearn）\n",
    "\n",
    "#### 3.1 KNeighborsClassifier 类\n",
    "\n",
    "sklearn 库的 neighbors 模块实现了KNN 相关算法，其中：\n",
    "- `KNeighborsClassifier` 类用于分类问题\n",
    "- `KNeighborsRegressor` 类用于回归问题\n",
    "\n",
    "这两个类的构造方法基本一致，这里我们主要介绍 KNeighborsClassifier 类，原型如下：\n",
    "\n",
    "```python\n",
    "KNeighborsClassifier(\n",
    "\tn_neighbors=5, \n",
    "\tweights='uniform', \n",
    "\talgorithm='auto', \n",
    "\tleaf_size=30, \n",
    "\tp=2, \n",
    "\tmetric='minkowski', \n",
    "\tmetric_params=None, \n",
    "\tn_jobs=None, \n",
    "\t**kwargs)\n",
    "```\n",
    "\n",
    "**来看下几个重要参数的含义：**\n",
    "- n_neighbors：即 KNN 中的 K 值，一般使用默认值 5。\n",
    "- weights：用于确定邻居的权重，有三种方式：\n",
    "    - weights=uniform，表示所有邻居的权重相同。\n",
    "    - weights=distance，表示权重是距离的倒数，即与距离成反比。\n",
    "    - 自定义函数，可以自定义不同距离所对应的权重，一般不需要自己定义函数。\n",
    "- algorithm：用于设置计算邻居的算法，它有四种方式：\n",
    "    - algorithm=auto，根据数据的情况自动选择适合的算法。\n",
    "    - algorithm=kd_tree，使用 KD 树 算法。\n",
    "        - KD 树是一种多维空间的数据结构，方便对数据进行检索。\n",
    "        - KD 树适用于维度较少的情况，一般维数不超过 20，如果维数大于 20 之后，效率会下降。\n",
    "    - algorithm=ball_tree，使用球树算法。\n",
    "        - 与KD 树一样都是多维空间的数据结构。\n",
    "        - 球树更适用于维度较大的情况。\n",
    "    - algorithm=brute，称为暴力搜索。\n",
    "        - 它和 KD 树相比，采用的是线性扫描，而不是通过构造树结构进行快速检索。\n",
    "        - 缺点是，当训练集较大的时候，效率很低。\n",
    "    - leaf_size：表示构造 KD 树或球树时的叶子节点数，默认是 30。\n",
    "            调整 leaf_size 会影响树的构造和搜索速度。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#加载数据集\n",
    "from sklearn.datasets import load_digits\n",
    "digits = load_digits()\n",
    "data = digits.data     # 特征集\n",
    "target = digits.target # 目标集\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#将数据集拆分为训练集（75%）和测试集（25%）:\n",
    "from sklearn.model_selection import train_test_split\n",
    "train_x, test_x, train_y, test_y = train_test_split(\n",
    "    data, target, test_size=0.25, random_state=33)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#构造KNN分类器：\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "# 采用默认参数\n",
    "knn = KNeighborsClassifier() "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#拟合模型：\n",
    "knn.fit(train_x, train_y) \n",
    "\n",
    "#预测数据：\n",
    "predict_y = knn.predict(test_x) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9844444444444445"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#计算模型准确度\n",
    "from sklearn.metrics import accuracy_score\n",
    "score = accuracy_score(test_y, predict_y)\n",
    "score"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "5f04dd1d6b72ff9d002f9c97d3bf130820120c0ac9ec7321437503cd785f0e6e"
  },
  "kernelspec": {
   "display_name": "Python 3.9.7 ('base')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
